Goto

Collaborating Authors

 approach 2




Assessing model error in counterfactual worlds

Howerton, Emily, Lessler, Justin

arXiv.org Artificial Intelligence

Counterfactual scenario modeling exercises that ask "what would happen if?" are one of the most common ways we plan for the future. Despite their ubiquity in planning and decision making, scenario projections are rarely evaluated retrospectively. Differences between projections and observations come from two sources: scenario deviation and model miscalibration. We argue the latter is most important for assessing the value of models in decision making, but requires estimating model error in counterfactual worlds. Here we present and contrast three approaches for estimating this error, and demonstrate the benefits and limitations of each in a simulation experiment. We provide recommendations for the estimation of counterfactual error and discuss the components of scenario design that are required to make scenario projections evaluable.


Crop recommendation with machine learning: leveraging environmental and economic factors for optimal crop selection

Sam, Steven, DAbreo, Silima Marshal

arXiv.org Artificial Intelligence

Department of Computer Science College of Engineering, Design and Physical Science Brunel University London steven.sam@brunel.ac.uk Abstract Agriculture constitut es a primary source of food production, economic growth and employment in India, but the sector is confronted with low farm productivity and yields aggravated by increased pressure on natural resources and adverse climate change variability. Efforts involv ing green revolution, land irrigations, improved seeds and organic farming have yielded suboptimal outcomes. The adoption of innovative computational solutions such as crop recommendation systems is considered as a new frontier to provide insights and help farmers adapt and address the challenge of low productivity. However, existing agricultural recommendation systems have predominantly focused on environmental factors and narrow geographical coverage in India, resulting in limited and robust predictions o f suitable crops with both maximum yields and profits. This work incorporates both environmental and economic factors and 19 crop varieties across 15 states as input parameters to develop and evaluate two recommendation module s - Random Forest (RF) and Support Vector Machines (SVM) - using 10 - fold Cross Validation, Time - series Split and Lag Variables approaches. Results show that the 10 - fold cross validation approach produced exceptionally high accuracy (RF: 99.96%, SVM: 94.71%), raising concerns of overfitting. However, the introduction of temporal order, which aligns more with real - world scenarios, reduces the model performance (RF: 78.55%, SVM: 71.18%) in the Time - series Split approach. To further increase the model accuracy while maintaining the temporal order, the Lag Variables approach was employed, which resulted in improved performance (RF: 83.62%, SVM: 74.38%) compared to the 10 - fold cross validation approach. Consequently, the study shows the Random Forest model developed based on the Lag Variables as the most preferred algorithm for op timal crop recommendation in the Indian context. Key words: Crop recommendation model; Random forest; Support vector machines; Indian agriculture; Exploratory data analysis 1. Introduction Agriculture is not only fundamental for food production but also constitutes a primary source for economic growth, employment and improvement of the wellbeing of many people globally. For example, the World Bank reports that agriculture constitutes about 4 % of the world's total gross domestic product (GDP), and in certain least developed nations, its contribution to GDP exceeds 25%.


Evaluation of Remote Driver Performance in Urban Environment Operational Design Domains

Hans, Ole, Walter, Benedikt, Adamy, Jürgen

arXiv.org Artificial Intelligence

Remote driving has emerged as a solution for enabling human intervention in scenarios where Automated Driving Systems (ADS) face challenges, particularly in urban Operational Design Domains (ODDs). This study evaluates the performance of Remote Drivers (RDs) of passenger cars in a representative urban ODD in Las V egas, focusing on the influence of cumulative driving experience and targeted training approaches. Using performance metrics such as efficiency, braking, acceleration, and steering, the study shows that driving experience can lead to noticeable improvements of RDs and demonstrates how experience up to 600 km correlates with improved vehicle control. In addition, driving efficiency exhibited a positive trend with increasing kilometers, particularly during the first 300 km of experience, which reaches a plateau from 400 km within a range of 0.35 to 0.42 km/min in the defined ODD. The research further compares ODD-specific training methods, where the detailed ODD training approaches attains notable advantages over other training approaches. The findings underscore the importance of tailored ODD training in enhancing RD performance, safety, and scalability for Remote Driving System (RDS) in real-world applications, while identifying opportunities for optimizing training protocols to address both routine and extreme scenarios. The study provides a robust foundation for advancing RDS deployment within urban environments, contributing to the development of scalable and safety-critical remote operation standards.


Leveraging Large Language Models for Automated Causal Loop Diagram Generation: Enhancing System Dynamics Modeling through Curated Prompting Techniques

Liu, Ning-Yuan Georgia, Keith, David R.

arXiv.org Artificial Intelligence

T ransforming a dynamic hypothesis into a causal loop diagram (CLD) is crucial for System Dynamics Modelling. Extracting key variables and causal relationships from text to build a CLD is often challenging and time - consuming for novice modelers, limiting SD tool adoption. This paper introduces and tests a method for automating the translation of dynamic hypotheses into CLDs using large language models (LLMs) with curated prompting techniques. We first describe how LLMs work and how they can make the inferences needed to build CLDs using a standard digraph structure. Next, we develop a set of simple dynamic hypothe ses and corresponding CLDs from leading SD textbooks. We then compare the four different combinations of prompting technique s, evaluating their performance against CLD s labeled by expert modelers . Results show that for simple model structures and using curated prompting techniques, LLMs can generate CLDs of a similar quality to expert - built ones, accelerating CLD creation.


Enhanced Sentiment Analysis of Iranian Restaurant Reviews Utilizing Sentiment Intensity Analyzer & Fuzzy Logic

Rokhva, Shayan, Teimourpour, Babak, Babaei, Romina

arXiv.org Artificial Intelligence

This research presents an advanced sentiment analysis framework studied on Iranian restaurant reviews, combining fuzzy logic with conventional sentiment analysis techniques to assess both sentiment polarity and intensity. A dataset of 1266 reviews, alongside corresponding star ratings, was compiled and preprocessed for analysis. Initial sentiment analysis was conducted using the Sentiment Intensity Analyzer (VADER), a rule-based tool that assigns sentiment scores across positive, negative, and neutral categories. However, a noticeable bias toward neutrality often led to an inaccurate representation of sentiment intensity. To mitigate this issue, based on a fuzzy perspective, two refinement techniques were introduced, applying square-root and fourth-root transformations to amplify positive and negative sentiment scores while maintaining neutrality. This led to three distinct methodologies: Approach 1, utilizing unaltered VADER scores; Approach 2, modifying sentiment values using the square root; and Approach 3, applying the fourth root for further refinement. A Fuzzy Inference System incorporating comprehensive fuzzy rules was then developed to process these refined scores and generate a single, continuous sentiment value for each review based on each approach. Comparative analysis, including human supervision and alignment with customer star ratings, revealed that the refined approaches significantly improved sentiment analysis by reducing neutrality bias and better capturing sentiment intensity. Despite these advancements, minor over-amplification and persistent neutrality in domain-specific cases were identified, leading us to propose several future studies to tackle these occasional barriers. The study's methodology and outcomes offer valuable insights for businesses seeking a more precise understanding of consumer sentiment, enhancing sentiment analysis across various industries.


Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies

Bartoldson, Brian R., Diffenderfer, James, Parasyris, Konstantinos, Kailkhura, Bhavya

arXiv.org Artificial Intelligence

This paper revisits the simple, long-studied, yet still unsolved problem of making image classifiers robust to imperceptible perturbations. Taking CIFAR10 as an example, SOTA clean accuracy is about $100$%, but SOTA robustness to $\ell_{\infty}$-norm bounded perturbations barely exceeds $70$%. To understand this gap, we analyze how model size, dataset size, and synthetic data quality affect robustness by developing the first scaling laws for adversarial training. Our scaling laws reveal inefficiencies in prior art and provide actionable feedback to advance the field. For instance, we discovered that SOTA methods diverge notably from compute-optimal setups, using excess compute for their level of robustness. Leveraging a compute-efficient setup, we surpass the prior SOTA with $20$% ($70$%) fewer training (inference) FLOPs. We trained various compute-efficient models, with our best achieving $74$% AutoAttack accuracy ($+3$% gain). However, our scaling laws also predict robustness slowly grows then plateaus at $90$%: dwarfing our new SOTA by scaling is impractical, and perfect robustness is impossible. To better understand this predicted limit, we carry out a small-scale human evaluation on the AutoAttack data that fools our top-performing model. Concerningly, we estimate that human performance also plateaus near $90$%, which we show to be attributable to $\ell_{\infty}$-constrained attacks' generation of invalid images not consistent with their original labels. Having characterized limiting roadblocks, we outline promising paths for future research.


Software Mention Recognition with a Three-Stage Framework Based on BERTology Models at SOMD 2024

Thi, Thuy Nguyen, Viet, Anh Nguyen, Van, Thin Dang, Thuy, Ngan Nguyen Luu

arXiv.org Artificial Intelligence

This paper describes our systems for the sub-task I in the Software Mention Detection in Scholarly Publications shared-task. We propose three approaches leveraging different pre-trained language models (BERT, SciBERT, and XLM-R) to tackle this challenge. Our bestperforming system addresses the named entity recognition (NER) problem through a three-stage framework. (1) Entity Sentence Classification - classifies sentences containing potential software mentions; (2) Entity Extraction - detects mentions within classified sentences; (3) Entity Type Classification - categorizes detected mentions into specific software types. Experiments on the official dataset demonstrate that our three-stage framework achieves competitive performance, surpassing both other participating teams and our alternative approaches. As a result, our framework based on the XLM-R-based model achieves a weighted F1-score of 67.80%, delivering our team the 3rd rank in Sub-task I for the Software Mention Recognition task.


Universal Auto-encoder Framework for MIMO CSI Feedback

So, Jinhyun, Kwon, Hyukjoon

arXiv.org Artificial Intelligence

Existing auto-encoder (AE)-based channel state information (CSI) frameworks have focused on a specific configuration of user equipment (UE) and base station (BS), and thus the input and output sizes of the AE are fixed. However, in the real-world scenario, the input and output sizes may vary depending on the number of antennas of the BS and UE and the allocated resource block in the frequency dimension. A naive approach to support the different input and output sizes is to use multiple AE models, which is impractical for the UE due to the limited HW resources. In this paper, we propose a universal AE framework that can support different input sizes and multiple compression ratios. The proposed AE framework significantly reduces the HW complexity while providing comparable performance in terms of compression ratio-distortion trade-off compared to the naive and state-of-the-art approaches.